Goto

Collaborating Authors

 San Juan Capistrano


Deep learning-based instance segmentation for the precise automated quantification of digital breast cancer immunohistochemistry images

Priego-Torresa, Blanca Maria, Lobato-Delgado, Barbara, Atienza-Cuevas, Lidia, Sanchez-Morillo, Daniel

arXiv.org Artificial Intelligence

The quantification of biomarkers on immunohistochemistry breast cancer images is essential for defining appropriate therapy for breast cancer patients, as well as for extracting relevant information on disease prognosis. This is an arduous and time-consuming task that may introduce a bias in the results due to intra- and inter-observer variability which could be alleviated by making use of automatic quantification tools. However, this is not a simple processing task given the heterogeneity of breast tumors that results in non-uniformly distributed tumor cells exhibiting different staining colors and intensity, size, shape, and texture, of the nucleus, cytoplasm and membrane. In this research work, we demonstrate the feasibility of using a deep learning-based instance segmentation architecture for the automatic quantification of both nuclear and membrane biomarkers applied to IHC-stained slides. We have solved the cumbersome task of training set generation with the design and implementation of a web platform, which has served as a hub for communication and feedback between researchers and pathologists as well as a system for the validation of the automatic image processing models. Through this tool, we have collected annotations over samples of HE, ER and Ki-67 (nuclear biomarkers) and HER2 (membrane biomarker) IHC-stained images. Using the same deep learning network architecture, we have trained two models, so-called nuclei- and membrane-aware segmentation models, which, once successfully validated, have revealed to be a promising method to segment nuclei instances in IHC-stained images. The quantification method proposed in this work has been integrated into the developed web platform and is currently being used as a decision-support tool by pathologists.


GLEN: General-Purpose Event Detection for Thousands of Types

Zhan, Qiusi, Li, Sha, Conger, Kathryn, Palmer, Martha, Ji, Heng, Han, Jiawei

arXiv.org Artificial Intelligence

The progress of event extraction research has been hindered by the absence of wide-coverage, large-scale datasets. To make event extraction systems more accessible, we build a general-purpose event detection dataset GLEN, which covers 205K event mentions with 3,465 different types, making it more than 20x larger in ontology than today's largest event dataset. GLEN is created by utilizing the DWD Overlay, which provides a mapping between Wikidata Qnodes and PropBank rolesets. This enables us to use the abundant existing annotation for PropBank as distant supervision. In addition, we also propose a new multi-stage event detection model CEDAR specifically designed to handle the large ontology size in GLEN. We show that our model exhibits superior performance compared to a range of baselines including InstructGPT. Finally, we perform error analysis and show that label noise is still the largest challenge for improving performance for this new dataset. Our dataset, code, and models are released at \url{https://github.com/ZQS1943/GLEN}.}


Development of Authenticated Clients and Applications for ICICLE CI Services -- Final Report for the REHS Program, June-August, 2022

Samar, Sahil, Chen, Mia, Karpinski, Jack, Ray, Michael, Sarin, Archita, Garcia, Christian, Lange, Matthew, Stubbs, Joe, Thomas, Mary

arXiv.org Artificial Intelligence

The Artificial Intelligence (AI) institute for Intelligent Cyberinfrastructure with Computational Learning in the Environment (ICICLE) is funded by the NSF to build the next generation of Cyberinfrastructure to render AI more accessible to everyone and drive its further democratization in the larger society. We describe our efforts to develop Jupyter Notebooks and Python command line clients that would access these ICICLE resources and services using ICICLE authentication mechanisms. To connect our clients, we used Tapis, which is a framework that supports computational research to enable scientists to access, utilize, and manage multi-institution resources and services. We used Neo4j to organize data into a knowledge graph (KG). We then hosted the KG on a Tapis Pod, which offers persistent data storage with a template made specifically for Neo4j KGs. In order to demonstrate the capabilities of our software, we developed several clients: Jupyter notebooks authentication, Neural Networks (NN) notebook, and command line applications that provide a convenient frontend to the Tapis API. In addition, we developed a data processing notebook that can manipulate KGs on the Tapis servers, including creations of a KG, data upload and modification. In this report we present the software architecture, design and approach, the successfulness of our client software, and future work.